AMS Without 4-Wise Independence on Product Domains
نویسندگان
چکیده
In their fundamental work, Alon, Matias and Szegedy [3] presented celebrated sketching techniques and showed that 4-wise independence is sufficient to obtain good approximations. The question of what random functions are necessary is fundamental for streaming algorithms (see, e.g., Cormode and Muthukrishnan [9].) We present a somewhat surprising fact: on product domain [n], the 4-wise independence can be omitted at least for the L2 norm. That is, we prove that the sketch of Alon, Matias and Szegedy [3] works even if the 4-wise independent functions on [n] are replaced by the product of 4wise independent functions on [n]. The main technical contribution of our paper is a novel combinatorial approach to analyzing the second moment (i.e., variance) of dependent sketches. By using our technique, we obtain a new result for the problem of measuring independence of datasets under the L2 norm. Measuring independence and k-wise independence is a fundamental problem that has multiple applications, and it has been the subject of intensive research during the last decade (see, among others, the recent work of Batu, Fortnow, Fischer, Kumar, Rubinfeld and White [4] and of Alon, Andoni, Kaufman, Matulef, Rubinfeld and Xie [1]). In the streaming environment, this problem was first addressed by Indyk and McGregor [15]. In this model the joint distribution is given empirically by a stream of elements, and the goal is to measure the distance between joint and product distribution. The question of estimating k-wise independence on a stream of tuples, instead of pairs, is of central importance in multiple applications, where data typically comes with multiple attributes such as database entries, minute-to minute changes in stock prices in a financial portfolio, and so on. Indyk and McGregor state, as an explicit open question in their paper, the problem of whether one can estimate k-wise independence on k-tuples for any k > 2. In this paper, using the novel analysis of AMS, we answer the aforementioned open question of Indyk and McGregor [15] affirmatively for the L2 norm for any constant k. Our algorithm gives an (ǫ, δ)-approximation using a single pass over the data and O(3 1 ǫ log 1 δ (logn + logm)) memory bits. Because of the exponential bounds, our result is only of theoretical interest. In our recent paper [7] we address the problem of measuring pairwise and k-wise independence under L1 norm. As a side remark, in [7] we use completely different methods, not applicable to the L2 norm, that rely on improved AMS analysis. This paper is a simplification of the previous versions [6]
منابع مشابه
Testing Non-uniform k-Wise Independent Distributions over Product Spaces
A discrete distribution D overΣ1 × · · · × Σn is called (non-uniform) k-wise independent if for anyset of k indexes{i1, . . . ,ik} and for any z1 ∈ Σi1 , . . . , zk ∈ Σik ,PrX∼D[Xi1 · · ·Xik = z1 · · · zk] =PrX∼D[Xi1 = z1] · · ·PrX∼D[Xik = zk]. We study the problem of testing (non-uniform) k-wiseindependent distributions over product spaces. For the uniform case ...
متن کاملRobust characterizations of k-wise independence over product spaces and related testing results
A discrete distribution D over Σ1 × · · · × Σn is called (non-uniform) k-wise independent if for any subset of k indices {i1, . . . , ik} and for any z1 ∈ Σi1 , . . . , zk ∈ Σik , PrX∼D[Xi1 · · ·Xik = z1 · · · zk] = PrX∼D[Xi1 = z1] · · ·PrX∼D[Xik = zk]. We study the problem of testing (non-uniform) k-wise independent distributions over product spaces. For the uniform case we show an upper bound...
متن کاملTesting non-uniform k-wise independent distributions
A distribution D over Σ1 × · · · × Σn is called (non-uniform) k-wise independent if for any set of k indices {i1, . . . , ik} and for any z1 · · · zk ∈ Σi1 × · · · × Σik , PrX∼D[Xi1 · · ·Xik = z1 · · · zk] = PrX∼D[Xi1 = z1] · · ·PrX∼D[Xik = zk]. We study the problem of testing (non-uniform) k-wise independent distributions over product spaces. For the uniform case we show an upper bound on the ...
متن کاملGlobal Solution of Fully-Observed Variational Bayesian Matrix Factorization is Column-Wise Independent
Variational Bayesian matrix factorization (VBMF) efficiently approximates the posterior distribution of factorized matrices by assuming matrix-wise independence of the two factors. A recent study on fully-observed VBMF showed that, under a stronger assumption that the two factorized matrices are column-wise independent, the global optimal solution can be analytically computed. However, it was n...
متن کاملMeasuring k-Wise Independence of Streaming Data under L2 Norm
Measuring independence and k-wise independence is a fundamental problem that has multiple applications and it has been the subject of intensive research during the last decade (see, among others, the recent work of Batu, Fortnow, Fischer, Kumar, Rubinfeld and White [11] and of Alon, Andoni, Kaufman, Matulef, Rubinfeld and Xie [2] ). In the streaming environment, this problem was first addressed...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010